6 research outputs found

    Sequential Action Selection for Budgeted Localization in Robots

    Get PDF
    International audienceRecent years have seen a fast growth in the number of applications of Machine Learning algorithms from Computer Science to Robotics. Nevertheless, while most such attempts were successful in maximizing robot performance after a long learning phase, to our knowledge none of them explicitly takes into account the budget in the algorithm evaluation: e.g. budget limitation on the learning duration or on the maximum number of possible actions by the robot. In this paper we introduce an algorithm for robot spatial localization based on image classification using a sequential budgeted learning framework. This aims to allow the learning of policies under an explicit budget. In this case our model uses a constraint on the number of actions that can be used by the robot. We apply this algorithm to a localization problem on a simulated environment. Our approach enables to reduce the problem to a classification task under budget constraint. The model has been compared, on the one hand, to simple neural networks for the classification part and, on the other hand, to different techniques of policy selection. The results show that the model can effectively learn an efficient policy (i.e. alternating between sensor measurement and movement to get additional information in different positions) in order to optimize its localization performance under each tested fixed budget

    Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire

    No full text
    Decision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits.La prise de dĂ©cision est un domaine trĂšs Ă©tudiĂ© en sciences, que ce soit en neurosciences pour comprendre les processus sous tendant la prise de dĂ©cision chez les animaux, qu’en robotique pour modĂ©liser des processus de prise de dĂ©cision efficaces et rapides dans des tĂąches en environnement rĂ©el. En neurosciences, ce problĂšme est rĂ©solu online avec des modĂšles de prises de dĂ©cision sĂ©quentiels basĂ©s sur l’apprentissage par renforcement. En robotique, l’objectif premier est l’efficacitĂ©, dans le but d’ĂȘtre dĂ©ployĂ©s en environnement rĂ©el. Cependant en robotique ce que l’on peut appeler le budget et qui concerne les limitations inhĂ©rentes au matĂ©riel, comme les temps de calculs, les actions limitĂ©es disponibles au robot ou la durĂ©e de vie de la batterie du robot, ne sont souvent pas prises en compte Ă  l’heure actuelle. Nous nous proposons dans ce travail de thĂšse d’introduire la notion de budget comme contrainte explicite dans les processus d’apprentissage robotique appliquĂ©s Ă  une tĂąche de localisation en mettant en place un modĂšle basĂ© sur des travaux dĂ©veloppĂ©s en apprentissage statistique qui traitent les donnĂ©es sous contrainte de budget, en limitant l’apport en donnĂ©es ou en posant une contrainte de temps plus explicite. Dans le but d’envisager un fonctionnement online de ce type d’algorithmes d’apprentissage budgĂ©tisĂ©, nous discutons aussi certaines inspirations possibles qui pourraient ĂȘtre prises du cĂŽtĂ© des neurosciences computationnelles. Dans ce cadre, l’alternance entre recherche d’information pour la localisation et la dĂ©cision de se dĂ©placer pour un robot peuvent ĂȘtre indirectement liĂ©s Ă  la notion de compromis exploration-exploitation. Nous prĂ©sentons notre contribution Ă  la modĂ©lisation de ce compromis chez l’animal dans une tĂąche non stationnaire impliquant diffĂ©rents niveaux d’incertitude, et faisons le lien avec les mĂ©thodes de bandits manchot

    Active learning under budget constraint in robotics and computational neuroscience. Robotic localization and behavioral modeling in non-stationary environment

    No full text
    La prise de dĂ©cision est un domaine trĂšs Ă©tudiĂ© en sciences, que ce soit en neurosciences pour comprendre les processus sous tendant la prise de dĂ©cision chez les animaux, qu’en robotique pour modĂ©liser des processus de prise de dĂ©cision efficaces et rapides dans des tĂąches en environnement rĂ©el. En neurosciences, ce problĂšme est rĂ©solu online avec des modĂšles de prises de dĂ©cision sĂ©quentiels basĂ©s sur l’apprentissage par renforcement. En robotique, l’objectif premier est l’efficacitĂ©, dans le but d’ĂȘtre dĂ©ployĂ©s en environnement rĂ©el. Cependant en robotique ce que l’on peut appeler le budget et qui concerne les limitations inhĂ©rentes au matĂ©riel, comme les temps de calculs, les actions limitĂ©es disponibles au robot ou la durĂ©e de vie de la batterie du robot, ne sont souvent pas prises en compte Ă  l’heure actuelle. Nous nous proposons dans ce travail de thĂšse d’introduire la notion de budget comme contrainte explicite dans les processus d’apprentissage robotique appliquĂ©s Ă  une tĂąche de localisation en mettant en place un modĂšle basĂ© sur des travaux dĂ©veloppĂ©s en apprentissage statistique qui traitent les donnĂ©es sous contrainte de budget, en limitant l’apport en donnĂ©es ou en posant une contrainte de temps plus explicite. Dans le but d’envisager un fonctionnement online de ce type d’algorithmes d’apprentissage budgĂ©tisĂ©, nous discutons aussi certaines inspirations possibles qui pourraient ĂȘtre prises du cĂŽtĂ© des neurosciences computationnelles. Dans ce cadre, l’alternance entre recherche d’information pour la localisation et la dĂ©cision de se dĂ©placer pour un robot peuvent ĂȘtre indirectement liĂ©s Ă  la notion de compromis exploration-exploitation. Nous prĂ©sentons notre contribution Ă  la modĂ©lisation de ce compromis chez l’animal dans une tĂąche non stationnaire impliquant diffĂ©rents niveaux d’incertitude, et faisons le lien avec les mĂ©thodes de bandits manchot.Decision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits

    Sequential Action Selection and Active Sensing for Budgeted Localization in Robot Navigation

    No full text
    International audienceRecent years have seen a fast growth in the number of applications of Machine Learning algorithms from Computer Science to Robotics. Nevertheless, while most such attempts were successful in maximizing robot performance after a long learning phase, to our knowledge none of them explicitly takes into account the budget in the algorithm evaluation: e.g. budget limitation on the learning duration or on the maximum number of possible actions by the robot. In this paper, we introduce an algorithm for robot spatial localization based on image classification using a sequential budgeted learning framework. This aims to allow the learning of policies under an explicit budget. In this case our model uses a constraint on the number of actions that can be used by the robot. Our approach enables to reduce the problem to a classification task under budget constraint. We apply this algorithm to a localization problem in a simulated environment. We compare it first to simple neural networks for the classification part and second to different techniques of policy selection. The results show that the model can effectively learn an efficient active sensing policy (i.e. alternating between sensor measurement and movement to get additional information in different positions) in order to optimize its localization performance under each tested fixed budget. We also show that with this algorithm the simulated robot can transfer the learned policy as well as knowledge about which budget gives the best performance/budget ratio in a given environment to other environments with similar properties. We finally test the algorithm with real navigation data acquired in an indoor environment with the PR2 robot. Altogether, these results suggest a promising framework for enabling budgeted localization in robots and avoiding to make robots relearn everything from scratch in each new environment

    Dopamine blockade impairs the exploration-exploitation trade-off in rats

    No full text
    International audienceIn a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. this study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies. All organisms need to make choices for their survival while being confronted to uncertainty in their environment. Animals and humans tend to exploit actions likely to provide desirable outcomes, but they must also take into account the possibility that environmental contingencies and the outcome of their actions may vary with time. Behavioral flexibility is thus needed in volatile environments in order to detect and learn new contingencies 1. This requires a delicate balance between exploitation of known resources and exploration of alternative options that may have become advantageous. How this exploration/exploitation dilemma may be resolved and regulated is still a subject of active research in the fields of Neuroscience and Machine Learning 2-5. Dopamine holds a fundamental place in contemporary theories of learning and decision-making. The temporal evolution of phasic dopamine signals across learning has been extensively replicated, and is most of the time considered as evidence of a role in learning 6-8 , but see alternative views in Coddington et al. 9. Dopamine reward prediction error (RPE) signals have been identified in a variety of instrumental and Pavlovian conditioning tasks 10-13. They affect plasticity and action value learning in cortico-basal networks 14-16 and have been directly related to behavioral adaptation in a number of decision-making tasks in humans, non-human primates 17 and rodents 18-21. Accordingly, it is commonly assumed that manipulations of dopamine activity affect the rate of learning, but this could represent a misconception. Besides learning, the role of dopamine in the control of behavioral performance is still unclear. Dopamine is known to modulate incentive choice (the tendency to differentially weigh costs and benefits) 22,23 , and risk-taking behavior 24 , as well as other motivational aspects such as effort and response vigour 25. Because dopamine is one of the key factors that may encode success or uncertainty, it might modulate decisions by biasing them toward options that present the largest uncertainty 26,27. This would correspond to a "directed" exploration strategy 5,28,29. Alternatively, success and failure could affect tonic dopamine levels and control random exploration of all options, as recently proposed by Humphries et al. 30. This form of undirected exploration, which is often difficul
    corecore